A 5-emotions stimuli set for emotion perception research with full-body dance movements

Ekman famously contended that there are different channels of emotional expression (face, voice, body), and that emotion recognition ability confers an adaptive advantage to the individual. Yet, still today, much emotion perception research is focussed on emotion recognition from the face, and few validated emotionally expressive full-body stimuli sets are available. Based on research on emotional speech perception, we created a new, highly controlled full-body stimuli set. We used the same-sequence approach, and not emotional actions (e.g., jumping of joy, recoiling in fear): One professional dancer danced 30 sequences of (dance) movements five times each, expressing joy, anger, fear, sadness or a neutral state, one at each repetition. We outline the creation of a total of 150, 6-s-long such video stimuli, that show the dancer as a white silhouette on a black background. Ratings from 90 participants (emotion recognition, aesthetic judgment) showed that intended emotion was recognized above chance (chance: 20%; joy: 45%, anger: 48%, fear: 37%, sadness: 50%, neutral state: 51%), and that aesthetic judgment was sensitive to the intended emotion (beauty ratings: joy > anger > fear > neutral state, and sad > fear > neutral state). The stimuli set, normative values and code are available for download.

Objectives. The objectives of this project were, first, to create a new stimulus set with a high level of experimental control. Dance movement sequences and visual characteristics of the stimuli were controlled, and stimuli length was equalized as much as possible to 6 s. Second, we set out to provide normative values of emotion recognition and aesthetic judgment for all created stimuli. Third, we identified the stimuli with highest emotion recognition rates and that were recognized above chance to provide a stimuli table with all values for future stimuli selection. Fourth, we explored interindividual differences in emotion recognition and beauty ratings (personality traits and aesthetic responsiveness).
The present study. We designed and created a new dance movement stimuli set based on the groundwork from previous stimulus creation procedures of dance stimuli sets 32,37,[48][49][50][51][52][53] , which ensured requirements for experimental control 31,54 . During the subsequent norming experiment, 90 participants watched the stimuli Figure 1. Stimuli Creation Procedure. The stimuli creation procedure was based on previous work 32,37,[48][49][50][51][52][53] , and respected requirements of experimental control for dance stimulus materials 31,54 . Choreography of the 30 sequences (of Western contemporary and ballet dance) took place prior to the recording session and was led entirely by the dancer in conversation with two of the authors with professional dance experience (JFC and LSE). Filming of the dance sequences took place at the Max-Planck-Institute for Empirical Aesthetics in Frankfurt/M. For filming, a Canon EOS 5D Mark IV camera was used, with a Canon EF 24-105 mm f/4 L IS USM lens (settings: e.g., framerate (raw) at 50fps and framerate (output) at 25 fps. White balance: 5000 k, shutter speed: 1/100 s, and ISO: 400. The video format: H.264, aspect ratio: 16:9, and resolution: 1920 × 1080). A standard 6 × 3 m chroma-key greenscreen background was used to allow for the creation of additional visual preparations of the stimuli, such as silhouette videos and blurred faces. For this, dedo-stage lights (7 dedo heads, dimmers and stands kit) were required to illuminate the entire greenscreen and to minimise shadows. Postproduction was done using Adobe After Effects 2019 and Adobe Premiere Pro 2019. All footage was trimmed to the exact start and end points of the movements. Each clip was rendered into a separate file in an uncompressed format and the title was added, as specified verbally by the dancer during the recording. Before saving, the sound tracks (speech and ambience noise) of the clips were removed. Using Adobe After Effects, "Keylight" effect was added to all files, and the background removed from each clip. The "Level" effect (setting: output black = 255) was further applied to each clip to colour the extracted foregrounds white (the visible dancer silhouette). "Opacity" keyframes were then added to the beginning and the end of each clip to allow for a fade-in and fade-out of each clip (8 frames). Finally, each clip was rendered as a separate file in H264 format. The dancer was Ms Anne Jung and her informed consent for publication of identifying information, images and film in an online open-access publication were obtained. A short video of the creation process is available here: https:// www. youtu be. com/ watch?v= Eij40 jtw8WE.
Preliminary data analyses. During stimuli creation, some sequences were performed more than once.
These were cases, where the dancer was unsatisfied with her performance and asked to repeat the sequence. Therefore, the number of total stimuli was 173 stimuli (including 23 duplicates that were deleted once emotion recognition rates were obtained). The 173 stimuli were divided into three sets for three separate online experiments. Fifteen videos of the stimuli set were randomly selected and included in all three separate online norming experiments. To confirm that emotion recognition rates between the three sets of stimuli were equivalent, we performed comparative analyses. These showed equivalent emotion recognition rates and aesthetic judgment; hence, data from the three experiments was aggregated and duplicates were removed, based on the highest emotion recognition rate. These are set out in the supplementary materials (section 1).
Emotion recognition accuracy. Data were non-normally distributed and non-parametric tests were performed.
Besides, we explored emotion recognition accuracy across the different categories of intended emotions in terms of correct and mis-classifications. The highest confusions between emotions were: Stimuli intended to express joy were most often misclassified as neutral state, i.e., in 23.1% of trials (correct classifications: 49.2%), anger as joy in 24.2% of trials (correct classifications: 50.5%), fear as sadness in 25.5% of trials (correct classifications: 39.5%), sadness as neutral state in 23.8% of trials (correct classifications: 47.5%), and neutral state as sad in 20.4% of trials (correct classifications: 52.61%). See Table 2 for a confusion matrix.

Subjective emotion recognition.
To explore how intensity and beauty ratings were distributed when using participants' subjective emotion judgment (i.e., participants' subjective perception of emotion, regardless of intended emotional expression by the dancer), the above analyses were repeated with the subjective emotion perception as grouping variable. No large differences between the two types of classifications were observed, as subjective perception and intended expression mostly overlapped. See supplementary materials section 2, for those analyses.
Interindividual differences in emotion recognition and aesthetic judgement. We next explored how interindividual differences modulated emotion recognition, intensity ratings and aesthetic judgment.
Only the personality trait conscientiousness predicted emotion recognition accuracy (conscientious individu-  www.nature.com/scientificreports/ als scored higher on the emotion recognition task). Intensity and beauty ratings were positively predicted by our overall engagement variable ("how interesting did you find this task?" 0 = not at all; 100 = very much; see "Methods" section). Beauty ratings were additionally predicted negatively by the personality trait negative emotionality. These regression analyses are set out in the supplementary materials (section 3). Regarding our variable dance experience, our sample had not been specifically recruited with this variable in mind. But because important previous research with dance professionals has shown links between dance experience and other neurocognitive processes 35,55-59 , dance experience data was collected as a means of experimental control. Participants' average dance experience was very low (1.6 years; SD = 4.55), with many participants having none at all (81.1%, range = 0-30). As could be expected, this variable showed no effects neither on emotion recognition, nor on beauty or intensity ratings (see supplementary materials, section 3).   www.nature.com/scientificreports/ Technical test. As a 'technical test' of the stimuli, we proceeded to inspect the emotion recognition rate for each stimulus. Of the 150 final stimuli, 139 had been recognized above chance level of 20%. We propose that any stimulus that was not recognized at least at 20% should not be used in subsequent experiments. For stimulus selection in subsequent experiments, to leave sequences intact (i.e., where all five stimuli of a sequence have been recognized above chance level), we provide a table with all information about each sequence and each of the stimuli composing a sequence. We propose that only sequences where the intended emotional expression of all five stimuli have been recognized above chance level should be included in an experiment. A total of 22 sequences include stimuli that where all recognized above chance level, i.e., a total of 110 stimuli. Table 3 shows the N = 150 stimuli of the stimuli set with their average Emotion Recognition Accuracy, Intensity Rating and Beauty Rating. Emotion Recognition Accuracies of stimuli were tested against chance level of 20% (100/5 = 20) by Boolean testing "Average Emotion Recognition Accuracy > 20?". Krippendorff 's alpha was computed for each sequence to assess interrater reliability. See Table 3 for this data.

Discussion
We created an emotional dance movement stimuli-set for emotion psychology and related disciplines. It contains 30 dance sequences performed five times each, with five different intended emotional expressivities at each repetition (joy, anger, fear, sadness, and a neutral state), i.e., a total of 150 stimuli. Emotion recognition for all five emotion categories as intended by the dancer were recognized above chance level (chance: 20%; joy: 45%, anger: 48%, fear: 37%, sadness: 50%, neutral state: 51%). Fear had significantly lower emotion recognition rates than the rest of the emotion categories, but was still above chance. This finding is in accordance with previous literature in which the difficulty to recognize fear from full-body movements has been reported 44 . One-hundredthirty-nine of the 150 stimuli were recognized above chance level. Respecting sequence membership, data showed that all five stimuli of a total of 22 sequences were recognized above chance level. This means that for leaving sequence-membership intact, a set of 110 stimuli (22 sequences × 5 emotions) can be used from this emotional dance movement stimuli set.
Importantly, as a manipulation check, the neutral state stimuli (neutral expressivity), were rated as less intense than all other emotions, confirming that these neutral state stimuli were less emotionally expressive in intensity, as had been intended by the dancer. Thus, this category can be used as a control condition. We found no difference between anger and joy in terms of intensity, as has been reported before. Anger was rated as more intense than the stimuli intended to express sadness and fear, and joy was rated as more intense than neutral (joy = anger; joy > neutral; anger > fear/sadness > neutral).
Regarding our conjecture about implicit emotion recognition via aesthetic judgment, we found that participants' aesthetic judgment (beauty ratings) was indeed sensitive to the intended emotion by the dancer. Stimuli expressive of joy and sadness received the highest beauty ratings, fear and neutral expressivity received the lowest (joy > anger > fear > neutral, and sad > fear > neutral). Interestingly, the high arousal emotions anger and joy were rated as equally intense, but participants' beauty ratings differed between the two emotions, with joyful movements being rated as more beautiful, than angry movements. On the other hand, low-intensity stimuli expressing sadness were rated as more beautiful, than other low-intensity stimuli including neutral state and fearful stimuli. These results suggest that aesthetic judgment could indeed be conceptualized as a type of implicit emotion recognition task.
Interindividual difference measures of personality and aesthetic responsiveness did not significantly predict emotion recognition accuracy, except for conscientiousness that predicted higher emotion recognition accuracy. Our engagement measure 'interest in task' predicted intensity ratings and beauty judgments, while beauty judgments were also negatively predicted by the personality trait negative emotionality.

Overall discussion and conclusion
It has long been argued that accurate emotion recognition from conspecifics confers an evolutionarily adaptive advantage to the individual 22,45,60,61 , yet results remain mixed 62,63 . Importantly, while there are different channels of emotional expressivity (face, voice, and the body), few validated full-body stimuli sets are available to test for emotion recognition effects and their possible links to broader cognitive function. This is an important pitfall, especially, as some research suggests that a high recognition accuracy, specifically, for bodily expressions of emotion (as opposed to facial expressions of emotions) could be associated with negative psychosocial outcomes 2,10 .
Therefore, we here propose dance movements as stimuli for emotion science, to answer a range of questions about human full-body emotion perception 13,14,[64][65][66][67][68] . Echoing this, we created and validated a set of 150 full-body dance movement stimuli for research in emotion psychology, affective neuroscience and empirical aesthetics. We provide emotion recognition rates, intensity ratings and aesthetic judgment values for each stimulus, and have demonstrated emotion recognition rates above chance for 139 of the 150 stimuli. We also provide first data to suggest that aesthetic judgment to this carefully controlled stimuli-set could serve as a useful implicit emotion recognition task.

Methods
Ethical approval for the experiment was in place through the Umbrella Ethics approved by the Ethics Council of the Max Planck Society (Nr. 2017_12). Informed consent was obtained from all participants and/or their legal guardians. The informed consent was given online through a tick-box system. All methods were performed in accordance with the relevant guidelines and regulations.
Participants: the dancer. One professional dancer from the Dresden Frankfurt Dance Company, Germany, collaborated and was remunerated as model for all stimuli. The dancer was a professional dancer trained Participants: the observers. Participant characteristics of the 90 participants are set out in Table 4.
The sample size was determined as follows. The final stimuli number (n = 173 including duplicates; see "Stimuli" section) would have been too many stimuli to rate for participants in one experiment. Therefore, stimuli were divided into 3 sets. Each set was rated by a different group of participants, and we planned to compare these three groups in terms of their ratings to 15 shared stimuli to evaluate interrater reliability. Sample size was determined separately for these groups, using G*Power 3.1 69 . Choosing the threshold of a large effect size of d = .80 70 , our sample size calculation for independent samples t-test (effect size = .80; alpha = .05; power = .90) suggested a sample size of 28 per group. We tested 30 participants per group to ensure full randomization (30 is divisible by 5 emotions, 28 is not).

Materials. Stimuli.
We used N = 173 video clips of 6 s length of a white silhouette dancer on black background. Stimuli contained no facial information, no costume, nor music. Each clip was faded in and out.
A dancer choreographed 30 sequences of dance movements. Of the 30 sequences, five were Western classical ballet, the rest were Western contemporary dance. The length was 8 counts in dance theory, ~ 8 s. The dancer performed each sequence five times each with different emotional expressivity at each repetition; joy, fear, anger, sadness and neutral state. A total of 173 stimuli were recorded instead of 150 (30 sequences × 5 emotions = 150 stimuli): When the dancer wasn't satisfied with her performance of a sequence, she asked to repeat it. Therefore, some of the stimuli were repeated. All 173 stimuli were included in the experiment to be able to select the "best" stimuli based on emotion recognition data. The 23 additional takes were deleted before analysis, by selecting the stimulus with the highest emotion recognition rate among duplicates. See Fig. 1 for an illustration of the stimuli creation process and a sample stimulus.
Questionnaires. Participants provided demographic information and interindividual difference measures were collected. First, the personality measure Big Five Inventory Short version (BFI-S) 71,72 that contains five subscales, namely Agreeableness, Conscientiousness, Extraversion, Negative Emotionality and Open-mindedness. Second, the Aesthetic Responsivity Assessment (AReA) 73 that screens for sensitivity and engagement with the arts. It contains 14 items (answers were given on a 5-point Likert scale between 0 (never) and 4 (very often)) that split into three first-order factors: Aesthetic Appreciation (AA; how much an individual appreciates different types    www.nature.com/scientificreports/ of art, like poetry, paintings, music, dance), Intense Aesthetic Experience (IAE; an individual's propensity to experience a subset of more intense aesthetic experiences like being moved, awe or the sublime), and Creative Behaviour (CB; an individual's propensity to actively engage in creative processes like writing, painting, music making or dancing). Participants had an average of 1.6 years (SD = 4.55) of dance experience, with many participants having no dance experience at all (81%, range 0 -30).
Attention and engagement checks. A series of attention checks controlled for engagement: On two trials of the questionnaires, participants were asked "please press the central circle" and non-compliance lead to exclusion. On two of the emotion recognition trials, cartoon videos were shown with very obvious emotional expressions (Sponge Bob crying a river of tears; correct response: sad; and Mikey Mouse's head turning red and exploding; correct response: angry). Participants who rated these incorrectly were excluded. Finally, a question was added after the emotion and aesthetics rating tasks, "Did the videos play alright?" (0 = not at all; 5 = yes, all good). Participants who rated 3 or less were excluded.
A final question in the experiment asked participants how interesting they found the task they had just participated in. This is because previous research suggests that the personal interest in the task modulates task engagement and quality of responses 32,43,74 . We included this variable in the regression models.
Procedure. See Fig. 1 for the stimuli creation procedure.
To obtain normative values, the N = 173 video clips were divided into three sets and presented to three separate groups of 30 participants. Three randomly chosen sequences (= 15 stimuli) were included in all three sets for interrater reliability assessments between the three groups. Including the three 'shared' sequences, the resulting three stimuli sets were as follows: Set 1 included only the ballet sequences (seven sequences) and consisted of 39 stimuli (including 4 additional takes). Set 2 included contemporary dance sequences (15 sequences) and consisted of 84 stimuli (including nine additional takes), and Set 3 included contemporary dance sequences (14 sequences) and consisted of 80 stimuli (including 10 additional takes).
The experiment was set up on Limesurvey®, where participants were also asked to read an information sheet and sign the consent form. Participants signed up for the rating experiment online via the Prolific© platform. The experiment began with the demographics questionnaire, followed by the emotion recognition task including beauty and intensity ratings, followed by the remaining questionnaires.
On each trial, participants were shown one dance video stimulus (randomized presentation), and then a forced-choice paradigm was used where participants were asked to select one emotion the dancer was intending to express (joy, anger, fear, sadness or neutral state). It was not possible to repeat the video after it had played one time. Two slider questions from 0 (not at all) to 100 (very much) probed for perceived intensity of the emotional expression and beauty of the movement (i.e., "How intensely was the emotion expressed?"/"How beautiful did you find the movement?"). "Intensity" was added as a proxy measure of "power" commonly used in emotion research. However, research participants find it difficult to rate "power" and we opted for "intensity" instead.
For a qualitative assessment, we added an open question, where participants were invited to indicate any other emotions that they perceived in the movement, by writing the emotion in a box (this data is not analysed in this manuscript). Participants were debriefed about the objectives of the experiment at the end.